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Abstract  The  majority  of  today’s  Internet  appK- 
eations  relies  on  point-to-point  transmission.  In  re¬ 
cent  years,  however,  multieast  transmission  has  foe- 
come  the  foundation  for  sueh  applieations  as  mul¬ 
tiparty  video  eonfereneing,  distributed  interaetive 
simulations,  and  eollaborative  systems.  We  de- 
seribe  a  novel  protoeol  to  eoordinate  multipoint 
groupwork  in  the  IP -multieast  framework.  The  pro¬ 
toeol  supports  Internet-wide  eoordination  for  large 
and  highly-interaetive  groupwork,  relying  on  trans¬ 
mission  of  eoordination  direetives  between  group 
members  aeross  a  shared  end-to-end  multieast  tree. 
We  also  deseribe  how  addressing  extensions  to  IP 
multieast  ean  be  put  to  use  for  our  multisite  eoor¬ 
dination  meehanism. 

Keywords:  group  coordination,  multicast,  collab¬ 
oration. 

1  Introduction 

Internet  computing  is  gradually  migrating 
from  the  standard  unicast  transmission  model 
to  multicasting.  In  the  IP  multicasting 
model  [3],  a  source  needs  to  send  a  packet  only 
once  to  the  network  interface,  and  multicast 
routers  replicate  the  packet  on  its  transmis¬ 
sion  path  to  multiple  receivers.  The  Internet 
Group  Management  Protocol  (IGMP)  allows 
a  host  to  join  a  multicast  group  by  informing 
its  local  router  to  forward  multicast  traffic  for 
this  group  to  the  leaf  subnetwork  where  the 
host  resides.  Protocols  such  as  DVMRP,  MO- 
SPF,  and  PIM  [7]  perform  the  construction  of 
multicast  delivery  trees  and  enable  packet  for- 
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warding  between  routers. 

With  IP  multicast,  no  guarantees  are  given 
for  reliable  or  order-preserving  delivery  of 
packets,  and  a  message  is  delivered  on  a  best- 
effort  basis  to  all  members  of  a  multicast  group. 
These  shortcomings  have  spurred  much  re¬ 
search  on  reliable  multicast  between  end  hosts, 
and  on  mechanisms  to  refine  IP  multicast,  such 
as  using  addressing  information  to  enable  sub¬ 
casting  or  anycasting  [9].  Subcasting  delivers 
or  retrieves  data  between  a  source  and  select 
members  of  a  multicast  group,  and  anycasting 
transfers  data  to  any  one  member  of  a  group, 
for  example  the  nearest  proxy  from  a  group 
of  servers.  While  IGMP  targets  group  mem¬ 
bership,  and  multicasting  routing  protocols  are 
concerned  with  delivery,  no  protocols  exist  to 
tackle  an  emerging  problem  of  multisite  com¬ 
munication,  which  is  group  coordination.  This 
problem  surfaces  especially  for  tightly-coupled 
sessions  featuring  explicit  conference  member¬ 
ship  control. 

Group  coordination  denotes  services  to  sup¬ 
port  distributed  hosts  in  coordinating  their 
joint  activities,  including  synchronization  of 
flows  from  different  sources,  ordered  delivery 
of  distributed  event  information,  and  the  con¬ 
current  use  of  and  access  to  shared  resources, 
referred  to  as  floor  control  [4]. 

Early  paradigms  of  group  coordination,  mu¬ 
tual  exclusion  [13]  or  concurrency  control  [2], 
have  been  restricted  to  discrete  data  domains 
rather  than  multimedia  contents,  using  lock¬ 
ing  to  manifest  control,  and  have  not  been  de¬ 
ployed  on  an  Internet  scope.  We  discuss  a 
novel  group  coordination  scheme,  called  Ag¬ 
gregated  Goordination  Protocol  (AGP),  which 
operates  on  a  shared  multicast  tree,  benefiting 
from  the  underlying  tree  structure  to  store  and 
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forward  coordination  primitives  between  hosts 
in  different  multicast  groups  on  the  tree.  AGP 
coordinates  distributed  activities  via  message 
passing,  and  manifests  control  by  ephemeral 
permissions  rather  than  actual  locks,  allowing 
control  over  continuous  media  flows  as  well  as 
discrete  data. 

Online  group  coordination  in  relation  to 
face-to-face  meetings  has  been  studied  by 
Isaacs  et  al.  [8],  showing  that  greediness 
for  time  and  bandwidth  on  behalf  of  non- 
cooperative  users,  as  well  as  lack  of  social  cues 
such  as  eye  contact  contribute  to  coordina¬ 
tion  problems.  While  geared  toward  support¬ 
ing  humans  in  their  interactions,  the  concept  of 
group  coordination  also  applies  to  agent-based 
interaction.  Methods  to  mediate  resource  con¬ 
tention  at  the  user  interface  level  have  been 
discussed  for  example  by  Ellis  and  Gibbs  [5]. 
Many  existing  systems  for  online  collaboration 
are  proprietary,  sparsely  documented,  and  lim¬ 
ited  to  local  area  networks,  or  sessions  with  few 
users.  Abdel- Wahab  discussed  an  early  proto¬ 
type  of  a  token-based  control  mechanism  for 
a  shared  workspace.  An  alternative  approach 
was  proposed  by  Aguilar  et  al.  [1],  in  which 
distributed,  task-activated  floor  control  serves 
as  a  high-level  analogy  to  collision-sensing  in 
channel  access. 

Ziegler  et  al.  [17]  researched  packet-switched 
voice  conferencing  in  a  broadcast  and  unicast 
setting.  ITU  standards  120  [14]  and  320  for 
video  conferencing  are  circuit- switched  and  de¬ 
signed  for  conferences  with  few  users.  Ya- 
vatkar  [16]  proposed  the  MCP  coordination 
protocol  between  concurrent  media  flows  based 
on  token  passing.  A  framework  for  hierarchical 
collaboration,  looking  at  bandwidth  and  delay 
issues,  was  presented  by  Vin  et  al.  [15].  Ray¬ 
mond  [13]  discussed  a  mutual  exclusion  algo¬ 
rithm  operating  on  a  static  logical  propagation 
tree,  in  which  nodes  maintain  a  pointer  to  the 
neighbor  node  that  leads  to  the  current  token 
holder  with  access  to  the  critical  section.  The 
algorithm  by  Neilsen  and  Mizuno  [11]  adds  a 
dynamic  link  between  a  requester  and  the  to¬ 
ken  holder  to  avoid  backtracked  routing  of  re¬ 
ply  messages  between  a  source  and  target  node. 

We  are  interested  in  the  question,  how  the 
routing  and  end-to-end  geometry  used  for  mul¬ 
ticasting  data  can  be  used  effectively  for  coor¬ 
dinating  the  activities  among  individuals  and 


groups  in  sessions  of  Internet  scope.  The  rest 
of  this  paper  is  organized  as  follows:  Section 
2  presents  the  system  model  and  assumptions. 
Section  3  discusses  AGP  in  its  operation  and 
correctness.  Section  4  summarizes  the  benefits 
of  tree-based  group  coordination. 

2  Model  and  Definitions 

We  define  a  coordination  session  Cs  =  {H,  L) 
as  a  computer  network  with  hosts  H  and  links 
L  C  H  X  H.  Gommunication  is  by  mes¬ 
sage  passing  only,  and  we  assume  that  mes¬ 
sages  eventually  arrive  correctly.  Each  host  in 
a  session  is  client  and  server  for  coordination 
primitives  (GP)  to  other  hosts.  GPs  are  is¬ 
sued  between  hosts  to  synchronize  their  joint 
tasks,  to  implement  causal  or  total  ordering  in 
distributed  events,  and  to  mitigate  access  to 
shared,  but  exclusive  resources.  Goordination 
management  must  be  aligned  with  membership 
operations  such  as  joining,  leaving  or  splitting 
groups. 

The  entities  involved  are  users  (processes), 
resources  at  the  various  hosts,  and  the  GPs 
coordinating  them.  Users  assume  social  roles 
(moderator,  panelist,  student),  and  both  users 
and  agents  assume  control  roles  (controlling, 
who  may  work  with  a  resource,  or  holding  the 
right  to  work  with  it).  We  call  the  host  con¬ 
trolling  access  and  operations  of  a  resource  the 
floor  controller  (EG)  for  that  resource.  The 
host  being  permitted  to  access  the  resource  at 
a  given  moment  is  called  floor  holder  (EH). 
Resources  can  be  located  at  a  specific  end 
host  (camera) ,  in  replicated  form  at  every  host 
of  a  multicast  group  handling  the  same  re¬ 
source  (telepointer),  or  in  the  network  (voice 
channel).  We  distinguish  between  generic  GPs 
(“grant”,  “release”,  “open”)  and  resource  or 
media-specific  GPs  (“rotate  left”,  “zoom  in”). 
The  temporary  privilege  to  work  with  a  multi- 
media  resource  is  also  called  floor.  GPs  contain 
the  sender  id,  the  receiver  ids,  the  time  of  cre¬ 
ation,  the  allowed  duration,  and  an  optional 
priority  value. 

Gontrol  over  shared  resources  can  be  cen¬ 
tralized,  distributed  or  a  hybrid  of  both.  It  can 
be  performed  successively  by  individual  session 
members,  partitioned,  where  various  session 
hosts  contribute  different  control  functions,  or 
democratic,  where  consensus  is  achieved  by 


negotiation,  yielding  a  new  consistent  control 
state.  A  host  holds  a  floor  in  his  turn  for  a  time 
interval  T,  which  can  be  preset  or  timed  out. 
In  our  model,  we  assume  that  the  FC  role  is 
associated  with  a  speciflc  host,  but  it  may  in¬ 
frequently  rove  among  hosts.  The  FH  changes 
at  every  turn.  We  assume  that  each  host  in 
a  session  runs  the  same  coordination  protocol, 
serving  requests  from  other  hosts  and  trans¬ 
mitting  requests  for  resources  placed  by  users 
or  their  processes. 

3  Aggregated  Coordination 
Protocol 

3.1  Description 

The  Aggregated  Coordination  Protocol  (AGP) 
operates  on  a  control  tree,  consisting  of  three 
main  types  of  nodes:  holder  nodes  host  the  FH, 
operating  on  a  resource  and  being  transmis¬ 
sion  sources;  control  nodes  host  the  FC  for  a 
speciflc  resource,  and  are  addressed  by  other 
nodes  asking  for  a  floor;  target  nodes  receive 
updates  of  the  operations  by  a  FH.  Nodes  on 
the  path  from  a  holder  to  its  targets  are  re¬ 
ferred  to  as  hop  nodes.  Leaf  nodes  terminate 
downward  forwarding  of  control  information  in 
the  tree. 

CPs  are  disseminated  across  a  single  shared 
tree  connecting  all  hosts  in  a  session.  The 
tree  corresponds  to  a  single  shared  acknowledg¬ 
ment  tree  for  concurrent  multicasting,  allowing 
multiple  sources  instead  of  building  separate 
dissemination  trees  per  source  and  multicast 
group.  The  tree  can  be  a  working  copy  of  the 
reliable  multicast  tree  prepared  by  a  protocol 
such  as  Lorax  [10].  We  outline  tree  mainte¬ 
nance  for  the  case  that  group  coordination  is 
deployed  in  a  network  complementing  an  exist¬ 
ing  underlying  reliable  multicast  tree. 

Various  mechanisms  for  initiation,  joining, 
and  advertising  of  collaborative  sessions  ex¬ 
ist.  We  assume  that  one  host,  representing 
itself  or  a  multicast  group,  initiates  a  session 
and  advertises  the  session  description,  multi¬ 
cast  address  and  media  in  use  in  a  session  direc¬ 
tory  [6].  The  directory  serves  as  a  rendezvous 
interface,  which  allows  other  hosts  to  join  via  a 
call-up  mechanism.  The  control  tree  is  grown 
from  the  initiating  node  as  the  root,  and  other 
hosts  joining  the  session  are  considered  first  off- 


tree,  unicasting  request-to-join  messages  to  the 
inviting  root  node  based  on  addressing  infor¬ 
mation  provided  by  the  session  directory.  A 
TTL  (time-to-live)  held  in  the  join  packet  re¬ 
stricts  the  session  scope. 

A  successful  adoption  of  a  child  node  to  the 
control  tree  is  confirmed  with  a  bind  message. 
Each  newly  joined  host,  as  the  root  of  its  sub¬ 
tree,  locally  advertises  invitation-to-join  mes¬ 
sages.  It  may  also  be  the  case  that  separate 
subtrees  may  fission  together  to  create  a  joint 
control  tree.  Each  node  in  the  tree  has  a  max¬ 
imum  degree  D,  which  must  be  high  enough 
to  reflect  the  session  structure,  but  not  exceed 
the  capacity  of  a  host  to  efficiently  serve  its 
children.  Furthermore,  if  the  tree  serves  also 
as  a  media  mixing  hierarchy  [12],  the  permis¬ 
sible  height  must  satisfy  the  end-to-end  delay 
constraints. 

Open  sessions  with  dynamic  membership 
may  incur  frequent  joining  and  leaving,  or  ac¬ 
cidental  withdrawal  of  hosts  from  the  control 
tree.  When  the  root  leaves  the  control  tree, 
the  eldest  child  in  the  subtree  is  the  designated 
new  root.  Age  may  be  determined  by  loca¬ 
tion,  joining  time,  or  address  labels.  A  hop 
node,  which  lost  contact  with  its  parent  for  a 
timeout  larger  than  possible  congestion  delays, 
must  rejoin  the  tree  as  described.  CPs  from  a 
node  identified  as  lost  are  held  at  the  process¬ 
ing  host  for  a  timeout  and  discarded  if  the  host 
does  not  reappear.  A  lost  and  rejoined  host 
must  resend  its  pending  CPs. 

Routing  of  CPs  in  the  tree  is  performed  as 
follows:  if  a  target  node  is  in  the  subtree  of  a 
node,  the  CP  is  routed  downward  the  subtree 
branch  where  the  target  resides,  otherwise  it  is 
sent  upward  to  its  parent  node.  This  operation 
could  be  performed  by  using  only  directional¬ 
ity  information  on  where  a  target  is  located, 
as  proposed  in  the  previously  mentioned  mu¬ 
tual  exclusion  scheme  by  Raymond  [13].  How¬ 
ever,  in  contrast  to  a  logical  geometry  used  to 
propagate  critical  section  requests,  our  proto¬ 
col  must  disseminate  CPs  involving  many  re¬ 
sources,  media  types,  and  their  coordination 
properties  across  an  infrastructure  prescribed 
by  the  actual  underlying  multicast  tree. 

We  need  therefore  a  more  expressive  ad¬ 
dressing  and  control  mechanism  for  multisite 
coordination  among  multicast  groups,  because 
many  nodes  may  assume  FC  or  FH  roles  during 


a  dynamic  collaboration  session.  AGP  assigns 
recursively  and  top-down  unique  prefix  labels 
to  each  node  joining  the  control  tree,  such  that 
a  child  node  label  contains  as  prefix  the  label  of 
its  parent.  For  example,  using  a  binary  alpha¬ 
bet,  the  root  receives  label  1,  its  children  are 
numbered  10  and  11,  etc.  The  idea  of  using 
binary  labels  has  first  been  used  for  routing  of 
instructions  in  multiprocessor  systems  and  has 
recently  been  applied  to  multicast  routing  and 
reliable  multicasting  [9].  Labels  allow  nodes 
to  be  addressed  individually,  although  being 
part  of  one  or  more  possibly  overlapping  mul¬ 
ticast  groups.  Also,  hosts  can  be  placed  in  the 
tree  independent  of  their  membership  in  dif¬ 
ferent  multicast  groups,  in  contrast  to  other 
approaches,  which  demand  that  the  tree  orga¬ 
nization  reflects  group  membership. 

Each  CP  contains  the  source’s  label,  the  tar¬ 
get  label(s),  a  sequence  number,  a  local  times¬ 
tamp,  a  session  wide  unique  resource  id  ob¬ 
tained  from  the  session  directory,  and  a  floor 
id,  which  denotes  the  temporary  access  per¬ 
mission  or  an  activity  descriptor  for  a  resource. 
The  structure  of  a  CP  packet  is  shown  in  Fig¬ 
ure  1.  CPid  identifies  the  coordination  prim¬ 
itive  and  the  type  of  operation,  characterizing 
various  resource  modalities.  TTL  indicates  the 
scope  of  the  CP.  Opt  is  reserved  for  priority 
codings.  The  timer  and  sequence  numbers  tag 
the  CP  uniquely  in  the  session  and  event  space. 
Checks  is  the  checksum  field.  The  Sourceaddr 
and  Targetaddrs  fields  contain  the  labels  for 
the  sender  of  a  CP  and  its  target  nodes. 
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V  CPid 

Typ  TTL  Opt 

Timest 

Seq#  Checks 

Source  addr 

Target  addrs ... 

Figure  1:  Packet  header  fields  for  coordination 
primitives  (CPs). 

Although  the  initial  source  and  root  will 
change  over  the  course  of  a  session,  the  branch¬ 
ing  properties  of  the  positional  tree  will  not 
change,  when  the  tree  is  virtually  rehung  with 
shifting  of  control  roles.  CPs  such  as  requests 
for  the  floor  on  a  resource  are  propagated 
across  the  tree  using  aggregation.  This  mech¬ 
anism  corresponds  to  the  solution  for  the  Ack 
implosion  problem,  used  in  reliable  multicast 
to  limit  the  feedback  of  receivers  to  a  source 


on  lost  or  corrupt  packets.  Aggregation  lim¬ 
its  the  control  process  to  local  groups,  rather 
than  letting  CPs  that  can  be  satisfied  locally, 
flow  back  all  the  way  to  the  holder  or  controller 
node.  AGP  packets  are  assumed  to  be  trans¬ 
ferred  reliably,  which  is  guaranteed  by  the  un¬ 
derlying  reliable  multicast  protocol.  If  AGP 
performs  independently,  it  needs  to  supply  its 
own  reliability  mechanisms. 

Hosts  need  to  maintain  locally  the  follow¬ 
ing  state:  the  resource  ids  shared  from  local  to 
the  remote  hosts,  and  the  remote  resource  ids 
accessed  locally,  together  with  their  CPids;  a 
state  table  indicating  which  resources  are  lo¬ 
cally  available  or  held  remotely,  together  with 
the  id  of  the  remote  FH;  and  a  request  queue 
ReqQ  which  collects  successive  CPs  from  dif¬ 
ferent  hosts  (the  queue  is  limited  by  the  num¬ 
ber  of  nodes  in  the  session).  If  a  hop  node 
receives  the  same  CPs  from  different  nodes,  it 
aggregates  them  into  one  CP  and  checks,  if  a 
response  to  these  request  can  be  satisfied  lo¬ 
cally  by  polling  its  own  state  and  the  state  of 
neighboring  nodes.  Otherwise  the  composite 
CP  is  self-routed  up  or  down  in  the  tree  to¬ 
ward  the  target  node(s).  This  implies  that  the 
number  of  CPs  required  to  coordinate  nodes 
decreases  as  the  request  activity  increases,  be¬ 
cause  requests  are  not  sent  further,  if  a  hop 
node  is  reached  that  already  processed  the  CP. 
The  target  address  may  also  contain  the  name 
of  a  multicast  group,  which  is  then  resolved 
into  its  members  locally  at  the  primary  receiver 
node  for  this  group.  A  joining  node  retrieves 
the  current  control  state  for  resources  of  con¬ 
cern  by  polling  its  parent  node. 

In  addition,  each  node  maintains  a  FIFO 
queue  of  pending  CPids,  identified  by  the 
senders’  labels.  A  hop  node  receiving  a  floor 
compares  the  label  of  the  elected  node  with 
the  head  of  the  queue,  and  self-elects  if  its  own 
label  matches  the  head,  or  forwards  it  in  the 
routing  procedure  outlined  above.  A  control 
node  receiving  a  request  responds  immediately 
to  the  request  by  sending  back  a  grant  message 
to  the  requester,  if  its  local  queue  is  empty,  or 
it  appends  it  to  its  queue.  When  control  shifts 
from  a  node  to  another  one,  the  pending  re¬ 
quest  queue  is  transferred  to  the  new  control 
node  and  its  new  address  is  multicast  to  all 
groups  sharing  the  associated  resource. 


3.2  Example 


3.3  Correctness 


Consider  a  scenario  where  three  hosts  from 
three  different  multicast  groups  MGl  -  MGS 
must  coordinate  access  to  a  shared  resource 
they  are  contending  for.  Figure  2  depicts  a 
snapshot  of  the  protocol  operation  in  a  ternary 
tree. 
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Figure  2:  Snapshot  of  AGP  operation  among 
multicast  groups  MGl  -  MGS. 

Assume  that  hosts  12,  100  and  11  all  re¬ 
quest  the  floor  held  by  FG  at  location  101. 
All  request  packets  need  to  be  routed  along 
branches  of  the  tree  to  101.  The  prefix  prop¬ 
erty  of  the  labels  allows  self-routing  of  these 
packets.  For  example,  assume  that  all  hosts 
are  informed  that  host  101  is  FG.  Host  12 
compares  its  label  with  the  target  label.  Its 
prefix  matches  (1),  but  the  second  identifier 
indicates  that  the  FG  is  on  subtree  0.  The  re¬ 
quest  packet  is  hence  sent  upward  to  host  1, 
which  compares  its  label  with  the  target,  and, 
detecting  that  101  is  one  of  its  children,  sends 
the  packet  to  host  10,  whose  label  matches  the 
prefix  of  101. 

This  node  performs  the  same  comparison 
and  the  packet  ultimately  arrives  at  FG,  which 
finally  grants  the  floor  to  node  12.  The  for¬ 
warding  of  control  directives  is  aggregated, 
i.e.,  multiple  requests  for  the  same  informa¬ 
tion  from  different  nodes  in  the  tree  are  as¬ 
sembled  in  a  hop  node  in  the  tree,  that  lies 
on  the  path  to  the  target,  and  are  forwarded 
combined.  This  limits  control  traffic  and  un¬ 
necessary  propagation  of  requests  that  for  ex¬ 
ample  cannot  be  satisfied  at  a  FG  node.  FG  is 
hence  liberated  from  the  need  to  communicate 
with  every  host  in  the  session,  and  deals  only 
with  relevant  requests  reaching  it  from  neigh¬ 
bor  nodes  by  self-routing. 


We  outline  correctness  and  resilience  proper¬ 
ties  of  AGP.  Assuming  that  GPs  are  sent  reli¬ 
ably,  a  node  can  receive  the  privilege  associated 
with  a  GP  only  if  sends  a  request  to  the  node 
controlling  the  associated  resource.  A  message 
enacting  the  GP  (e.g.,  grant-floor)  can  only  be 
sent  by  a  controlling  node.  The  addressing  la¬ 
bels  in  the  tree  are  unique  and  a  GP  is  only 
issued  for  a  specific  requesting  node.  Provided 
that  there  is  a  single  holder  node  for  a  GP  a 
priori,  the  protocol  will  hence  continue  to  as¬ 
sign  a  GP  to  exactly  one  node. 

If  several  nodes  send  a  GP  and  another  node 
holds  the  privilege  associated  with  it  for  an  in¬ 
finite  time,  a  deadlock  exists.  Reasons  may 
be  that  the  wrong  node  or  no  node  holds  the 
privilege,  it  is  unreachable,  or  the  propagation 
of  a  request  fails  prematurely.  Assuming  fi¬ 
nite  FIFO  request  queues  at  each  node,  a  re¬ 
quest  reaching  a  control  node  will  eventually 
be  served.  For  a  tree  of  height  /i  =  2,  we  have 
a  star-based  scheme  with  n  =  D  +  1  nodes.  A 
floor  will  rove  between  the  D  +  1  nodes.  Using 
induction  over  the  height  of  the  tree,  we  assume 
that  the  liveness  argument  holds  for  any  height 
I  with  2  <  I  <  h.  A  tree  of  height  h  has  one 
additional  level,  from  where  GPs  can  be  sent 
and  must  be  replied  to.  Because  of  the  acyclic 
tree  geometry,  it  is  impossible  that  a  GP  can 
be  outrun  by  a  moving  floor  privilege.  Assume 
that  a  node  in  the  additional  level  of  the  tree 
(the  root,  or  a  leaf  level  of  a  subtree)  sends  a 
GP.  The  addressing  semantics  of  the  protocol 
ensures  that  messages  are  self-routed  to  this 
level  of  the  tree  and  will  eventually  reach  the 
target  node. 

Finally,  although  privilege  passing  may 
progress  during  a  session,  a  node  may  starve 
by  being  exempted  indefinitely  from  the  alloca¬ 
tion  process.  However,  the  addressing  seman¬ 
tics  of  AGP  and  the  finite  FIFO  queue  forbid 
that  accepted  GPs  at  a  control  node  are  ig¬ 
nored  indefinitely.  Based  on  these  qualitative 
arguments  we  conjecture  that  AGP  is  correct. 

3.4  Fairness 

Fairness  refers  to  the  frequency  and  duration, 
by  which  nodes  acquire  a  privilege  on  the  av¬ 
erage  for  a  given  period,  which  is  for  exam¬ 
ple  the  session  lifetime,  or  the  lifetime  of  a 


shared  resource  in  a  session.  Network  latency, 
geographic  distance  and  location  of  nodes,  or 
varying  host  capabilities  can  all  be  factors  in 
causing  uneven  dissemination  patterns  of  CPs 
and  unfair  allocation  of  floors.  Leaf  nodes  take 
more  time  to  propagate  their  requests  across 
the  root  to  a  node  on  the  other  side  of  a  tree, 
than  nodes  just  below  the  root.  Shared  trees 
also  do  not  provide  shortest  paths  between  a 
source  and  its  receiver  set  [7],  which  may  cause 
increased  latency  in  CP  transfer.  It  is  hence 
important  to  establish  service  policies,  which 
counteract  these  factors.  One  simple  solution, 
a  “least-recently-served”  policy,  can  be  enacted 
by  letting  each  node  maintain  a  local  record 
of  the  most  recent  CPs  and  their  originating 
nodes.  Those  nodes  are  serviced  first,  which 
do  not  appear  on  the  list,  or  appear  last  in 
time  or  frequency  of  service. 

3.5  Resilience 

Previously,  we  assumed  that  transfer  of  CPs 
and  accounting  of  control  information  among 
nodes  is  failure  free.  Even  we  if  assume  reli¬ 
able  multicasting  to  ensure  that  CPs  are  even¬ 
tually  transferred,  the  control  apparatus  may 
need  additional  recovery  mechanisms  to  ensure 
consistency.  This  applies  to  regular  node  fail¬ 
ure,  control  node  failure,  link  failure,  or  to¬ 
ken  loss  or  duplication.  Such  exceptions  can 
be  preempted  by  redundant  dissemination  of 
status  information,  or  by  detection  of  loss  and 
recovery.  Regular  node  or  controller  failures 
are  typically  detected  via  timeout  and  recov¬ 
ered  with  an  election  protocol,  with  neighbor 
nodes  providing  state  updates.  Continuation 
of  a  split  session  is  possible  if  the  members  in 
each  partition  agree  to  continue,  e.g.,  if  a  quo¬ 
rum  exists. 

One  method  to  deal  with  the  case  that  a 
CP  is  lost  completely  or  reaches  only  a  subset 
of  nodes  is  to  multicast  a  CP_probe  message 
from  a  node  i  to  the  session  remainder.  The 
response  time  tr  to  receive  a  response  is  bound 
by  the  maximum  time  for  the  probe  to  traverse 
the  longest  link,  tmax-,  plus  the  time  tack  for  the 
receiver  nodes  to  send  a  positive  or  negative 
acknowledgment,  tr  =  ‘2.tmax  +  tack-  If  the  CP 
is  diagnosed  as  lost,  the  controller  node  for  the 
respective  floor  must  regenerate  the  token  and 
send  an  update  to  the  session. 


3.6  Performance 

Attaching  positional  labels  to  nodes  in  a  Al¬ 
ary  tree  implies  an  additional  storage  cost  of 
log2D  bits  per  level  in  a  positional  tree  of  N 
receivers  and  height  log^N,  i-e.,  IgN  bits  are 
needed.  Using  32-bit  labels  for  designating 
sources  and  targets  in  message  headers,  up  to 
2^^  hosts  can  hence  be  accommodated.  Prefix 
comparison  is  cheaper  for  nodes  close  to  the 
root  due  to  shorter  labels.  Serving  a  CP  costs 
Ccp  =  Creq  +  Cresp  +  Cupd,  Comprising  the  cost 
to  send  a  request  to  a  control  node,  receive  a 
response,  and  multicast  an  update  on  the  new 
state.  We  compare  the  delay  in  a  unicast,  mul¬ 
ticast,  and  aggregated  multicast  communica¬ 
tion  model  under  full  load  (each  node  sends  a 
CP),  assuming  that  the  host  processing  cost  for 
request,  response  and  update  packets  is  equal 
and  normalized.  The  average  path  length  be¬ 
tween  nodes  is  assumed  to  be  the  same  for  all 
models.  A  represents  the  individual  processing, 
packetization  and  transmission  delay  for  each 
type  of  packet. 

In  unicast,  the  coordination  delay  incurs 
{N  —  1)  requests,  replies  from  control  nodes, 
and  updates,  where  N  is  the  current  session 
size,  i.e.,  CD^c  =  3(AI  —  1)A.  In  multicast, 
{N  —  1)  nodes  send  requests,  and  the  control 
node  multicasts  one  reply  and  one  update  back 
to  the  session,  i.e.,  CDmc  =  (AI  -|-  1)A.  In 
aggregated  multicast,  CPs  are  handled  within 
multicast  groups  and  only  the  root  of  a  group 
forwards  a  composite  request  to  its  parent,  or 
responds  to  group-local  requests,  if  it  holds 
the  information  locally.  With  K  groups  we 
have  on  the  average  G  =  [^]  members  per 
group,  and  per  group  there  are  G  requests  in¬ 
side  a  group,  K  aggregated  requests  sent  to 
a  control  node  from  all  groups,  and  one  mul¬ 
ticast  response  and  update,  i.e.,  CDamc  = 
((G  -1)  +  {K-1)+  2)A  =  (G  +  K)X. 

Figure  3  shows  the  average  cost  to  coordi¬ 
nate  hosts  in  sessions  up  to  size  N  =  1000, 
clustered  into  K  =  N/W  groups,  and  with 
normalized  transmission  delay  A.  It  elicits  the 
benefits  of  aggregated  multicast  coordination. 

4  Conclusions 

The  IP  multicast  model  lacks  refined  support 
for  intergroup  coordination.  We  have  outlined 
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Figure  3:  Coordination  cost  for  various  com¬ 
munication  models. 

the  main  issues  in  establishing  a  coordination 
mechanism  based  on  IP  multicast,  as  it  can 
be  of  use  for  collaborative  Internet  or  Intranet 
applications.  The  proposed  protocol,  AGP, 
is  based  on  a  logical  control  tree,  and  scales 
to  large  groups.  Addressing  extensions  intro¬ 
duced  to  end-to-end  multicasting  have  been 
put  to  use  for  our  multisite  coordination  mech¬ 
anism  to  facilitate  efficient  routing  of  CPs,  and 
subcasting  to  subsets  of  multicast  groups. 

AGP  is  a  standalone  mechanism,  but  can 
rely  on  an  underlying  end-to-end  reliable  mul¬ 
ticast  tree.  This  approach  allows  to  eliminate 
the  need  to  build  a  separate  control  structure 
for  tracking,  routing,  withholding,  or  forward¬ 
ing  control  directives. 
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